首页> 外文OA文献 >CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos
【2h】

CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos

机译:CDC:用于精确时间行动的卷积 - 反卷积网络   未修剪视频中的本地化

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Temporal action localization is an important yet challenging problem. Given along, untrimmed video consisting of multiple action instances and complexbackground contents, we need not only to recognize their action categories, butalso to localize the start time and end time of each instance. Manystate-of-the-art systems use segment-level classifiers to select and rankproposal segments of pre-determined boundaries. However, a desirable modelshould move beyond segment-level and make dense predictions at a finegranularity in time to determine precise temporal boundaries. To this end, wedesign a novel Convolutional-De-Convolutional (CDC) network that places CDCfilters on top of 3D ConvNets, which have been shown to be effective forabstracting action semantics but reduce the temporal length of the input data.The proposed CDC filter performs the required temporal upsampling and spatialdownsampling operations simultaneously to predict actions at the frame-levelgranularity. It is unique in jointly modeling action semantics in space-timeand fine-grained temporal dynamics. We train the CDC network in an end-to-endmanner efficiently. Our model not only achieves superior performance indetecting actions in every frame, but also significantly boosts the precisionof localizing temporal boundaries. Finally, the CDC network demonstrates a veryhigh efficiency with the ability to process 500 frames per second on a singleGPU server. We will update the camera-ready version and publish the sourcecodes online soon.
机译:时间行为的本地化是一个重要但具有挑战性的问题。考虑到由多个动作实例和复杂背景内容组成的未修剪视频,我们不仅需要识别它们的动作类别,还需要定位每个实例的开始时间和结束时间。许多最先进的系统使用段级分类器来选择和划分预定边界的建议段。但是,理想的模型应该超越段级别,并及时以细粒度进行密集预测,以确定精确的时间边界。为此,我们设计了一个新颖的卷积反卷积(CDC)网络,该网络将CDCfilters置于3D ConvNets之上,已被证明对于抽象化动作语义有效,但可以减少输入数据的时间长度。同时进行所需的时间上采样和空间下采样操作,以预测帧级粒度下的操作。在时空和细粒度的时间动力学中联合建模动作语义方面,它是独一无二的。我们以端到端的方式有效地培训CDC网络。我们的模型不仅在每个帧中实现了出色的检测动作性能,而且还大大提高了定位时间边界的精度。最后,CDC网络具有很高的效率,能够在单个GPU服务器上每秒处理500帧。我们将更新可用于相机的版本,并尽快在线发布源代码。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号